Distribution of MRI-derived T2 values as a biomarker for in vivo rapid screening of phenotype severity in mdx mice

Background The pathology in Duchenne muscular dystrophy (DMD) is characterized by degenerating muscle fibers, inflammation, fibro-fatty infiltrate, and edema, and these pathological processes replace normal healthy muscle tissue. The mdx mouse model is one of the most commonly used preclinical models to study DMD. Mounting evidence has emerged illustrating that muscle disease progression varies considerably in mdx mice, with inter-animal differences as well as intra-muscular differences in pathology in individual mdx mice. This variation is important to consider when conducting assessments of drug efficacy and in longitudinal studies. We developed a magnetic resonance imaging (MRI) segmentation and analysis pipeline to rapidly and non-invasively measure the severity of muscle disease in mdx mice. Methods Wildtype and mdx mice were imaged with MRI and T2 maps were obtained axially across the hindlimbs. A neural network was trained to rapidly and semi-automatically segment the muscle tissue, and the distribution of resulting T2 values was analyzed. Interdecile range and Pearson Skew were identified as biomarkers to quickly and accurately estimate muscle disease severity in mice. Results The semiautomated segmentation tool reduced image processing time approximately tenfold. Measures of Pearson skew and interdecile range based on that segmentation were repeatable and reflected muscle disease severity in healthy wildtype and diseased mdx mice based on both qualitative observation of images and correlation with Evans blue dye uptake. Conclusion Use of this rapid, non-invasive, semi-automated MR image segmentation and analysis pipeline has the potential to transform preclinical studies, allowing for pre-screening of dystrophic mice prior to study enrollment to ensure more uniform muscle disease pathology across treatment groups, improving study outcomes.


Introduction
Duchenne muscular dystrophy (DMD) is an inherited muscle wasting disease caused by loss of function mutations in dystrophin [1,2].Dystrophin, a large protein that links the cytoskeleton with the extracellular matrix, is encoded by the DMD gene.In the absence of dystrophin, the muscle membrane becomes easily disrupted which causes muscles to degenerate, and in its place, the intact muscle is replaced by fibrosis and fat, as well as immune infiltrate.The dystrophin-deficient mdx mouse model on the C57Bl/10 background harbors a naturally occurring mutation in exon 23 of dystrophin resulting in a premature stop codon [3,4].It is the model most commonly used in preclinical therapeutic trials as it recapitulates many of the same pathological features seen in humans with DMD [5].Although the mdxB10 mouse produces no detectable dystrophin protein, this model exhibits milder histological and functional deficits than humans with DMD.A more severe mouse model referred to as mdxD2 was generated through backcrossing the mdxC57Bl/10 model over 5 generations onto the DBA/2J strain [6].The muscle pathology in mdxD2 mice recapitulates additional features of human DMD pathology with increased areas of muscle damage consisting of increased fibrofatty deposition, myofiber necrosis, and immune infiltrate.Along with these findings, mdxD2 mice exhibit impaired repair capacity and greater functional deficits compared to the mdxC57Bl/10 model [7][8][9][10][11].
The availability of multiple mouse models of DMD has allowed for extensive preclinical studies to evaluate potential therapeutics.With preclinical testing occurring across multiple animal colonies, there have been efforts to standardize procedures to increase the reliability and translatability of endpoint measures (TREAT-NMD.orgprotocols).It is well known that the mdx model displays a wide range of both intra-animal variability and inter-animal variability [12][13][14][15].For example, mdx littermates can exhibit virtually no muscle damage or fibrosis in one subject, while another is severely affected.Moreover, within the same animal one muscle may have extensive pathology while the contralateral muscle is minimally affected.This inherent variation can contribute to poor assay sensitivity and necessitates large cohorts of animals to achieve the statistical power required to show treatment effects [16].Spurney et al showed that quantitative histology measures of degeneration and regeneration, and creatine kinase showed high variance in the mdx model.Evans blue dye uptake, as a measure of muscle membrane leak, also showed both animal to animal variability and intra-animal variability across studies [17].
Understanding the mitigating factors that can influence preclinical outcome measures is critical when designing and evaluating therapeutic efficacy studies.Environmental factors such as cage design, light/dark cycle, food, sex, age, and weight can be more easily controlled than biological factors.Non-invasive imaging techniques such as T 2 -weighted magnetic resonance imaging (MRI) can provide information on in vivo muscle tissue health at the outset of a study, without the need for tissue sampling or sacrifice.T 2 is a property of tissue relaxation in the presence of a magnetic field; it tends to be high in the presence of free fluid, and is often abnormally elevated in the presence of inflammation, edema, and necrosis [18].Assigning animals to treatment groups based on visual and qualitative estimates of disease severity risks introducing bias, while a labor-and time-intensive processing protocol is not feasible to implement at the beginning of a large study.Further, it can be challenging to identify a numerical summary statistic that reflects the qualitative differences readily observed between imaging datasets.For example, in datasets with small, localized regions of signal change, signal averaging can obscure differences due to the much larger areas of unchanged tissue.Indeed, previous efforts to use average T 2 values as an imaging biomarker of disease progression in mdx mice have been hampered by high variance that has been interpreted as a lack of significant differences between timepoints or groups, but more likely reflects the inadequacy of bulk averaging as a summary statistic [18].We hypothesize that the distribution of muscle T 2 values reflects features of the underlying disease, and propose a rapid high-throughput screening method that combines MR imaging, semi-automated segmentation, and quantitative analysis with meaningful summary statistics to estimate disease severity in vivo, balance treatment groups, and a priori exclude animals with unusually high or low disease burdens.

Animals
The 6-7 weeks old WTC57Bl/6, mdxB10 and mdxD2 male mice were purchased from the Jackson Laboratory (stock 000664, 001801, and 013141, respectively) and imaged at 8 weeks of age.Mice were housed in a specific pathogen-free facility on a 12-hour light/12-hour dark cycle, fed ad libitum and sacrificed in accordance with the Northwestern University's Institutional Animal Care and Use Committee regulations and the NIH Guide for the Care and Use of Laboratory Animals to minimize distress.

MR image acquisition
Mice were anesthetized in an induction chamber with inhaled isoflurane and then transferred to a dedicated imaging bed with a nosecone to deliver continuous isoflurane.Each mouse was positioned prone with legs tucked beneath the abdomen to reduce susceptibility artifacts.Respiration was monitored using a pressure sensitive pillow and body temperature was maintained with a warm water circulating jacket placed under the animal.MRI was performed on a 9.4T Bruker Biospec 9430 (Bruker Corporation, Billerica, MA, USA) with a 30 cm bore and 12 cm gradient insert, running Paravision 6.0.1.Each mouse was imaged using a 40 mm quadrature volume coil (Bruker) operating in transmit/receive mode.After localizer images were acquired, a T 2 map was acquired using a spin echo sequence (Multi Slice Multi Echo, MSME) oriented axially and centered at the mid-calf.The following parameters were used: TR = 4000 ms, TE = 9-225 ms (30 echoes, echo spacing = 9 ms), MTX = 256 x 256, FOV 3.5 x 3.5 cm, 5 slices, 1 mm slice thickness and 1 signal average.Acquisition time was approximately 18 minutes [19].

Training of segmentation network
Images were imported into Amira 2020.2 software (Thermo Fisher Scientific, Waltham, MA, USA) and the first echo of the T 2 map acquisition (TE = 9 ms), a relatively proton-density weighted image, was used to segment a region of interest (ROI) containing hindlimb and paraspinal muscles, and remove other tissues such as bladder, skin, bone, and intermuscular fat.As this was a time-consuming manual process, a deep learning prediction model was trained in Amira using the built-in tools (Fig 1).A training dataset was assembled using segmentation and image data from 20 mdxB10 scans and 5 WTC57Bl/6 scans, and concatenated with a test dataset assembled from segmentation and image data obtained by rescanning 5 mdxB10 mice and 3 WTC57Bl/6mice 2-3 days after their initial imaging session.A validation dataset was assembled using data from an additional 5 mdxB10 mice and 4 WTC57Bl/6 mice.A preliminary model was trained using the "Deep Learning Training" module in Amira with type Back-bonedUNet, number of classes = 2, resnet18 backbone, patch size 128, batch size 8, max patch overlap 0.3, maximum number of epochs 100, Adam optimizer, learning rate 0.0001, and no data augmentation.A refined model was trained using the same module, initialized with the weights of the preliminary model, with the following modifications: patch size 256, maximum number of epochs 500, learning rate 0.0005, and data augmentation with geometry transforms including horizontal and vertical flip, 30-degree rotation, 10% zoom, and 10 degree shear, and an early stopping criterion to stop if no model improvement for 25 iterations.The "Segmentation Metrics" Amira Xtra module was used to assess the performance of the model on the validation dataset [20].

Semiautomated image segmentation
After the deep learning model was generated, the first echo of the T 2 map acquisition was imported into Amira and segmented using the "Deep Learning Prediction" module using the established model weights.The segmentation was inspected by a trained observer and manually refined as necessary to correct minor errors (e.g., incomplete removal of the bladder).Muscle ROIs were exported from Amira as a mask image to be used in further processing steps.For consistency, all manually segmented datasets used in the study were re-segmented using the semiautomatic segmentation process.

MR image processing
Using a custom script via the Amira Python console, all echoes of the T 2 map acquisition were masked to include only muscle voxels using the mask image generated in the segmentation step.The masked data were then imported into JIM 7 (Xinapse Systems Ltd, West Bergholt, Essex, UK) and the apparent T 2 was fit using the built in nonlinear curve fitting module.The model was monoexponential, of the form: where S is the measured signal, S 0 is the fully recovered signal, TE is the specified echo time, and T 2 (the quantity being fit) is the transverse relaxation time.A custom Python script was used to extract a list of muscle T 2 values from the resultant maps.Image slices with significant respiratory artifact were excluded.The top and bottom 1% of T 2 values were excluded to reduce the effect of outliers on summary statistics.For each set of T 2 values, the interdecile range (IDR, the difference between the 90 th percentile value and the 10 th percentile value) was calculated as a measure of distribution spread.The mode (i.e.location of the peak) was computed by fitting a kernel density function to the measured T 2 values and finding the T 2 value corresponding to the peak of the distribution.The Pearson Mode Skewness was calculated as a measure of distribution symmetry [21].

mean À mode standard deviation
Eq2

Repeatability of image metrics
To assess the repeatability of measurements of Pearson skew and IDR of the T 2 distribution, 5 mdxB10 mice and 3 WT mice were reimaged two days after their initial imaging session.Both sets of data were analyzed according to the protocol described above.

Evans blue dye
To determine if the MR image-based measures, Pearson skew and IDR, correlate with Evans blue dye uptake measures, mdxB10, mdxD2, and WT mice were subjected to MR imaging and subsequent evaluation for Evans blue dye uptake in hindlimb muscles (quadriceps, glut/ham, gas/sol).WT mice (n = 5), mdxB10 mice (n = 4) and mdxD2 mice (n = 3) underwent the MRI protocol described above followed by measurements of Evans Blue dye uptake that were quantified as described previously [22][23][24].Briefly, mice were injected with 5 μl per gram of body weight of 10 μM Evans Blue dye (E2129, Sigma-Aldrich).Mice were euthanized, and tissue was excised, minced, and placed in 1ml of formamide in a 24-well plate placed at 55˚C for 2 hrs.Absorbance was measured at 620 nm on a Synergy HTX multi-mode plate reader (Bio-Tek1) from 200 μl of sample solution in a 96-well plate.Each sample was assessed in duplicate.Results are reported as the average arbitrary optical density units per mg of tissue.

Statistical analysis
A key purpose of this study was to identify and validate relevant summary statistics for MRI image distributions based on the hypothesis that the distribution of T 2 values would be sufficiently non-normal as to render mean and standard deviation inappropriate summary statistics.A custom Python script was used to test the normality of the distribution of T 2 values for each animal using D'Agostino's K2 test, and compute summary statistics [25][26][27][28][29].All distributions observed were markedly non-normal (p < 0.001).Summary statistics of Pearson mode skew and IDR were investigated to describe the width and asymmetry of distributions and enable quantitative comparisons.Test-retest variability was assessed using Bland-Altman analysis and the comparisons of Pearson skew and IDR with Evans Blue dye uptake were calculated using a simple linear regression model [30].

Development of a semi-automated muscle segmentation tool
Manual segmentation of the muscle ROI including the hindlimb and paraspinal muscles was a lengthy manual process due to the need to manually separate muscle from other tissues including the peritoneal cavity, testes, bone, and regions of normal fluid and fat found subcutaneously and at intermuscular interfaces.Because of the manual and subjective nature of this process, it was both time-consuming (>1hr per animal) and subject to variability.Use of the deep learning model reduced the time for generating muscle ROIs to a few seconds and reduced total hands-on analyst time to approximately 10 minutes.In the validation dataset, the model-predicted label field matched the ground truth label field with a Jaccard Index of 0.97, indicating excellent agreement.An example segmentation of a dataset not included in the training data is shown in Fig 2.

Image processing
A total of 39 mice were imaged in 47 imaging sessions.Of the 47 T 2 maps, one slice was excluded from each of 10 maps due to respiratory artifact, and two slices were excluded from one map.After bounding the masked and T 2 -fitted data to eliminate the bottom 1% and top 1% of T 2 values for each imaging session, T 2 values ranged from 19.3-81.4seconds.

Stratification of muscle disease through semi-automated MR imaging analysis
The range of disease severity present in this cohort of same-aged and same-sex mdxB10 mice is evident upon examination of the T 2 maps shown in Fig 3. ).In all cases, the distribution of T 2 values is observed to be non-normal (as confirmed by D'Agostino's K 2 test of normality), asymmetric, and poorly characterized by the Gaussian distribution.The shape of the distribution markedly differs between the different animals, with wider, more asymmetric distributions corresponding to more severe disease, and narrower distributions corresponding to milder disease.A plot of Pearson skew vs. IDR of the distribution of T 2 values was effective in stratifying disease severity and correlated well with visual assessment of the selected T 2 maps (Fig 4).Together these data show that MR image-based measures extracted using the newly developed, semi-automated segmentation and analysis pipeline stratify muscle disease severity across healthy to severe pathology.

Semi-automated MR image-based measures are reproducible across disease severity
Both Pearson skew and IDR were found to be highly repeatable across imaging sessions (Fig 5).Based on repeated measures across 5 mdxB10 and 3 WT C57Bl/6 animals, the minimum detectable change for IDR was 1.75 seconds (12.6% of the average value).The minimum detectable change for Pearson skew was 0.03 (6% of the average value).IDR exhibited a strong linear correlation with the Evans blue dye with R 2 = 0.9 (p < 0.001).Pearson skew was also linearly correlated with R 2 = 0.71 (p < 0.001).This suggests that the in vivo imaging based metrics can serve as a surrogate measure of muscle membrane health and integrity as compared to the terminal Evans blue dye uptake assessment.

Discussion
We have outlined a non-invasive, quick, and reproducible imaging and analysis protocol based on the distribution of hindlimb T 2 values that can accurately stratify animals based on disease severity to aid in preclinical therapeutic study design.The associated code for processing and analysis are available online at https://doi.org/10.18131/x17wx-02s47[31].Screening animals prior to assigning treatment groups can help mitigate potential confounding effects of the wide range of disease severities in the mdx model, as the true treatment effects can be obscured if there are differences in the underlying severity of disease between the groups.Many studies rely on comparing histopathological findings or Evans blue dye uptake in separate groups of untreated and treated animals.Because these measures rely on terminal procedures that cannot be undertaken at the outset of a study, they are especially susceptible to baseline variation between experimental groups.Elevated T 2 values reflect regions of muscle disturbance that have previously been correlated with muscle fiber damage in mdx mice [17,32,33].Several previous efforts to use MRI in the assessment of disease severity in mdx mice have focused qualitatively on T 2 weighted images; researchers attempting to use T 2 maps quantitatively have consistently noted very small differences in the mean value of muscle T 2 (on the order of a few milliseconds) between mdx and wildtype animals with very large standard deviations that confound statistical differentiation between groups [18,34].We compared the gaussian distribution described by the mean and standard deviation of measured muscle T 2 values to the actual measured distribution and found that not only was the gaussian distribution a poor descriptor, it masked features useful for stratifying the severity of disease in mdx animals.Attempts to differentiate the groups by counting voxels above a specified T 2 threshold were highly dependent on threshold selection and ineffective at stratifying disease.
Inspection of the distribution of muscle T 2 values in mdx and WT mice identified stark differences in the peak shape and distribution width.All distributions were markedly non-normal, confirming that mean and standard deviation were inappropriate summary statistics.In WT mice, the peak was sharp and narrow, with the mode (location of the peak) occurring at a low T 2 value corresponding to healthy muscle.The distribution had a shallow tail of higher values corresponding to blood vessels, inter-muscular interfaces, and noise.In contrast, for the mdxB10 mice, the modes occurred at higher T 2 values, and the distributions were more asymmetric with heavy tails of high T 2 values corresponding to regions of edema and necrosis.In mdxD2 mice, the modes more closely resembled those of WT mice, but the distributions were asymmetric with heavy tails of high T 2 values corresponding to damaged tissue.Although beyond the scope of this study, further investigation is warranted to elucidate the biological underpinning of the differences between T 2 distributions observed in mdxD2 and mdxB10 mice.
We hypothesized that IDR and Pearson skew would be suitable metrics to assess the width and tail heaviness of T 2 distributions and therefore the burden of muscle damage.IDR measures distribution spread and captures the magnitude of the highest T 2 values, and Pearson skew detects a disproportionately high percentage of high-T 2 values.Since each metric describes a slightly different aspect of the distribution, we plotted them in combination to create a spectrum of disease severity.The metrics clearly and repeatably differentiated mdx animals from wildtype animals, as well as differentiating mdx animals with severe disease from those with mild disease (Fig 4).Importantly, Pearson skew and IDR correlated well with both qualitative observation of T 2 weighted MR images and the terminal measurements made using Evans blue dye (Fig 6).Quantification of Evans blue dye uptake has been commonly used to evaluate muscle membrane permeability and stability as a measure of muscle health, as the dye is excluded from healthy intact muscle.
In future studies, this distribution analysis will be extended to a more comprehensive radiomics approach to identify texture-based image features that reflect localized areas of muscle damage [35].It may be extended to a more in-depth per-muscle analysis, which was explicitly not undertaken in this study as the focus was on developing methods for rapid screening of global animal phenotype severity.A major strength of this rapid semiautomated image segmentation and analysis pipeline is that images of several major hindlimb muscle groups can be acquired in approximately 30 minutes, with only 10-15 minutes per animal needed for semiautomated image segmentation and processing.This significantly increases the throughput relative to manual segmentation and processing, which required 1.5-2 hours per animal, and renders it suitable for screening of cohorts of animals prior to assigning treatment groups.The proposed pipeline could substantially improve the quality of studies of therapeutic efficacy in mouse models of muscular dystrophy by ensuring that study cohorts are appropriately

Fig 2 .
Fig 2. Reliable semi-automatic muscle segmentation.(A) Proton density weighted axial image showing hindlimb and paraspinal musculature.Lower hindlimb muscle groups visualized include gastrocnemius and soleus (gas/sol); upper hindlimb muscle groups include quadriceps and hamstrings (quad/ham) and gluteal muscles (glut, near pelvis).(B) Manual segmentation of muscle ROI.(C) Result of segmentation using the trained deep learning model.(D) Difference between manual and automated segmentation showing overall excellent agreement with minor variation around the muscle-skin and muscle-bone interfaces.https://doi.org/10.1371/journal.pone.0310551.g002 Fig 3A illustrates an mdxD2 mouse, which is known to have a relatively severe muscle disease phenotype.Fig 3B illustrates an mdxB10 mouse with severe muscle disease as indicated by extensive regions of mildly elevated T 2 , in addition to several sizable regions of highly elevated T 2 .In contrast, Fig 3C illustrates a mdxB10 mouse with comparatively mild disease, in which there are a few small focal regions with mildly elevated T 2 , but few voxels with highly elevated T 2 .The healthy wildtype mouse shown in Fig 3D has few voxels with high T 2 and these are primarily located at the intermuscular interfaces.For each T 2 map, the distribution of T 2 values is plotted immediately to the right, with the Gaussian distribution corresponding to the data mean and standard deviation overplotted as a dashed line (Fig 3A-3D

Fig 3 .Fig 4 .Fig 5 .
Fig 3. MR image-based measures stratify muscle disease severity.(A) Representative image of a mdxD2 animal with a severe muscle disease phenotype as evidenced by localized areas of high T 2 corresponding to muscle damage (yellow).(B) Representative image of a mdxB10 animal with a severe disease phenotype seen by an overall increase in T 2 values combined with distinct regions of elevated T 2 corresponding to edema.(C) Representative image of a mdxB10 animal with a mild disease phenotype evidenced by a few small focal regions of mildly elevated T 2 and minimal highly elevated T 2 voxels.(D) Representative image of a healthy WT C57Bl/6 control animal showing minimal regions of high or mildly elevated T 2 aside from normal intramuscular interfaces.Plots to the right of each image show the histogram of T 2 values for that animal.The overplotted red dashed lines correspond to the gaussian distribution defined by the mean and standard deviation of the T 2 distribution, illustrating its inadequacy to describe the actual data.https://doi.org/10.1371/journal.pone.0310551.g003 Good agreement was observed between the in vivo imaging based metrics (Pearson skew and IDR) measured through the semi-automated MR image segmentation, and the ex vivo Evans Blue dye uptake measurement (Fig 6).Based on the Pearson skew and IDR values shown in Fig 4, the mdxB10 mice used in this experiment had moderate levels of disease.Healthy WT muscle had the lowest level of Evans blue dye uptake and IDR / Pearson skew values followed by mdxB10, with mdxD2 muscle taking up the highest levels of Evans blue dye and having the highest IDR / Pearson skew values.

Fig 6 .
Fig 6.Strong correlation between Evans blue uptake and MRI measures across muscle disease severity.Comparison of noninvasive in vivo imagebased measurements of disease severity IDR (A) and Pearsons skew (B) with ex vivo measurement of Evans blue dye.No muscle disease occurred in healthy WT C57Bl/6 controls (n = 5), mild disease was present in mdxB10 mice (n = 4), and severe disease was evident in mdxD2 mice (n = 3).Both measures exhibited a strong linear correlation with the Evans blue dye (R 2 vs IDR: 0.89; R 2 vs Pearson skew: 0.86;).https://doi.org/10.1371/journal.pone.0310551.g006